17 research outputs found

    An Architecture for Data and Knowledge Acquisition for the Semantic Web: the AGROVOC Use Case

    Get PDF
    We are surrounded by ever growing volumes of unstructured and weakly-structured information, and for a human being, domain expert or not, it is nearly impossible to read, understand and categorize such information in a fair amount of time. Moreover, different user categories have different expectations: final users need easy-to-use tools and services for specific tasks, knowledge engineers require robust tools for knowledge acquisition, knowledge categorization and semantic resources development, while semantic applications developers demand for flexible frameworks for fast and easy, standardized development of complex applications. This work represents an experience report on the use of the CODA framework for rapid prototyping and deployment of knowledge acquisition systems for RDF. The system integrates independent NLP tools and custom libraries complying with UIMA standards. For our experiment a document set has been processed to populate the AGROVOC thesaurus with two new relationships

    Construction of a medical corpus based on information extraction results

    No full text
    The paper presents a method of automatic construction of a semantically annotated corpus using the results of a rulebased information extraction (IE) application. Construction of the corpus is based on using existing programs for text tokenization and morphological analysis and combining their results with domain related correction rules. We reuse the specialized IE system to obtain a corpus annotated on the semantic level. The texts included within the corpus are Polish free text clinical data. We present the documents - diabetic patients' discharge records, the structure of the corpus annotation and the methods for obtaining the annotations. Initial evaluations based on the results of manual verification of selected data subset are also presented. The corpus, once manually corrected, is designed to be used for developing supervised machine learning models for IE applications

    Medical text data anonymization

    No full text
    The paper discusses a program for removing patient identification information from hospital discharge documents in order to make them available for scientific research e.g. information extraction system designing. The presented method allows de–anonymization of documents using a key–code file that is created on the basis of a patient‘s surname, forename and date of birth. Problems of normalization of crucial data used in the key–code file creation are presented

    Introducing the Public Transport Domain to the Web of Data

    No full text
    corecore